An Empirical Evaluation for Feature Selection Methods in Phishing Email Classification
نویسندگان
چکیده
Phishing email detection is highly dependent on the accuracy of anti-phishing classifiers. Classifiers that use Machine-Learning techniques achieve highest phishing email classification accuracy results according to the literature. Using effective features in Machine-Learning is a critical step in raising classifiers detection accuracy. This study aims at evaluating a number of feature subset selection methods as they relate to the phishing email classification domain. In order to perform this study, a total of 47 classification features were constructed as previously proposed in the literature. The primary outcome of this study is that the Wrapper evaluator and the Best-First: Forward searching method resulted in finding the most effective features subset among all other evaluated methods. This study addresses the gap that exists between fragmented literature items by evaluating them together following common evaluation metrics. Using the best performing feature selection method, an effective features subset was found among the 47 previously proposed features, which resulted in a highly accurate anti-phishing email classifier with an f1 score of 99.396%. This also shows that Thanks to Buhooth for funding this work. A preliminary version of this work was published in [13]. a highly competitive anti-phishing email classifier can still be constructed by only using existing MachineLearning techniques and previously proposed features if an effective features subset is found.
منابع مشابه
Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملFeature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection
Dimensionality reduction is generally performed when high dimensional data like text are classified. This can be done either by using feature extraction techniques or by using feature selection techniques. This paper analyses which dimension reduction technique is better for classifying text data like emails. Email classification is difficult due to its high dimensional sparse features that aff...
متن کاملA Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملA Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Syst. Sci. Eng.
دوره 28 شماره
صفحات -
تاریخ انتشار 2013